Diskann Benchmarking Wrapper #260

tarang-jain · 2024-07-29T22:57:17Z

Brings DiskANN into cuvs-bench

Build and search in-memory DiskANN index
Build and search SSD DiskANN index
Build a cuvs Vamana index on GPU and serialize it in DiskANN format. Search on CPU using in-memory DiskANN search API.

…diskann-wrapper

…into diskann-wrapper

…diskann-wrapper

bkarsin · 2024-12-03T20:25:56Z

cpp/bench/ann/CMakeLists.txt

@@ -32,6 +32,12 @@ option(CUVS_ANN_BENCH_USE_CUVS_BRUTE_FORCE "Include cuVS brute force knn in benc
 option(CUVS_ANN_BENCH_USE_CUVS_CAGRA_HNSWLIB "Include cuVS CAGRA with HNSW search in benchmark" ON)
 option(CUVS_ANN_BENCH_USE_HNSWLIB "Include hnsw algorithm in benchmark" ON)
 option(CUVS_ANN_BENCH_USE_GGNN "Include ggnn algorithm in benchmark" OFF)
+option(CUVS_ANN_BENCH_USE_DISKANN "Include DISKANN search in benchmark" ON)
+option(CUVS_ANN_BENCH_USE_CUVS_VAMANA "Include cuVS Vamana with DiskANN search in benchmark" ON)
+if(CMAKE_SYSTEM_PROCESSOR MATCHES "(ARM|arm|aarch64)")


Does MSFT DiskANN repo not support ARM?

Yes, they have mkl-devel as a dependency, which is not meant to be installed in aarch64.

cpp/bench/ann/src/cuvs/cuvs_cagra_diskann_wrapper.h

bkarsin · 2024-12-03T20:33:11Z

cpp/bench/ann/src/diskann/diskann_wrapper.h

+template <typename T>
+void diskann_ssd<T>::build(const T* dataset, size_t nrow)
+{
+  diskann::build_disk_index<float>(base_file_.c_str(),


Where is this defined? Is it using the CPU DiskANN repo to build the ssd verson of the index?

Yes, that is right. It is just to benchmark the CPU build

cjnolet · 2024-12-05T00:50:27Z

/ok to test

tarang-jain · 2024-12-05T08:00:43Z

/ok to test

…into diskann-wrapper

cjnolet · 2024-12-12T01:12:51Z

/ok to test

achirkin

Please avoid algorithm-specific changes to cpp/bench/ann/src/common/benchmark.hpp as per our offline discussion.
I'm not opposed to exposing dataset/index filepaths to algorithms in general though.
But maybe it would make sense to hide (filter out) them when dumping all build parameters to avoid cluttering the console/reports and exposing private user filepaths?

cpp/bench/ann/src/cuvs/cuvs_vamana_wrapper.h

…into diskann-wrapper

tarang-jain · 2025-01-25T05:12:44Z

@achirkin are you suggesting that we expose the dataset / index filepaths to all algorithms, but only use them for diskann ssd indexes?

tarang-jain · 2025-01-25T09:15:57Z

cpp/bench/ann/src/common/benchmark.hpp

@@ -144,7 +150,8 @@ void bench_build(::benchmark::State& state,

  const auto algo_property = parse_algo_property(algo->get_preference(), index.build_param);

-  const T* base_set      = dataset->base_set(algo_property.dataset_memory_type);
+  const T* base_set = nullptr;
+  if (index.algo != "diskann_ssd") base_set = dataset->base_set(algo_property.dataset_memory_type);


@achirkin if we do not have this line, the entire dataset will be read into the base_set variable needlessly for DiskANN ssd based indexes. For these indexes, only the path to the dataset needs to be known and the DiskANN index building APIs will read the data from the path. In fact, it is important we do it this way because those SSD indexes are designed to let the data remain on disk if there are extremely large datasets.

You can just set the dataset_memory_type to kHostMmap by default for diskann index. It will lazily map the file content, so it won't read anything until you actually use the pointer.
It will still open the file and read the first two words anyway - independently of this change. That is because the benchmark reads and reports the dataset dimensionality.

achirkin

Please try to always adhere to these two rules:

No algorithm-specific code in the common header files, such as benchmark.hpp
All configuration-specific code should go into conf.hpp, not into benchmark.hpp.

achirkin · 2025-01-27T08:15:41Z

cpp/bench/ann/src/common/benchmark.hpp

@@ -144,7 +150,8 @@ void bench_build(::benchmark::State& state,

  const auto algo_property = parse_algo_property(algo->get_preference(), index.build_param);

-  const T* base_set      = dataset->base_set(algo_property.dataset_memory_type);
+  const T* base_set = nullptr;
+  if (index.algo != "diskann_ssd") base_set = dataset->base_set(algo_property.dataset_memory_type);


You can just set the dataset_memory_type to kHostMmap by default for diskann index. It will lazily map the file content, so it won't read anything until you actually use the pointer.
It will still open the file and read the first two words anyway - independently of this change. That is because the benchmark reads and reports the dataset dimensionality.

achirkin · 2025-01-27T08:34:31Z

cpp/bench/ann/src/common/benchmark.hpp

+  if (index.algo == "diskann_ssd") {
+    make_sure_parent_dir_exists(index.file);
+    index.build_param["dataset_file"]  = dataset->base_filename();
+    index.build_param["path_to_index"] = index.file;
+  }
+


I think you shouldn't need path_to_index parameters, because the index member file is accessible from within the index in your algorithm-specific code.
If you need the dataset file path during index build, you can either ingest it as a build param in the conf.hpp, similar to how we inject some search parameters in parse_index(...), or create one more member in the configuration::index - all in conf.hpp; I think it's ok to have it available to all algorithm at that point.

Unfortunately, the cuvs::bench::configuration::index object's attributes are not directly passed down to the specific algorithm's wrapper class. So we would need the path_to_index. The algorithm specific structs use only the parameters that are explicitly passed down from the configuration object.

My question is: CAN we pass the cuvs::bench::configuration::Index down into the wrappers? Or rather, any reason we shouldn't do it?

achirkin · 2025-01-27T08:36:52Z

cpp/bench/ann/src/common/benchmark.hpp

+  if (index.algo != "diskann_ssd")
+    filename = index.file;
+  else
+    filename = index.file + "_disk.index";


Same problem, the algorithm-specific code should go to the algorithm headers.

This is just for checking if the index file exists. The diskann ssd index is saved with the suffix _disk.index along with the filename.

This is algorithm-specific code in common benchmarking code, though. This should be moved into the diskann portions of the code. You have to understand that we create generalized abstractions for this very purpose- anything in common/benchmark.hpp should be applied to all index algorithms and any specific things should be propagated down to the respective index types. Polluting the common code like this moves up the abstraction tree, rather than down, and it results in hard to maintain spaghetti code.

This is another reason to propagate this down to the individual wrappers. You could at least specify a method on the wrapper class to allow any algorithms to override and adjust the filename as needed. That would be more robust than one-offs like this.

…into diskann-wrapper

…diskann-wrapper

cjnolet · 2025-01-30T15:43:34Z

/ok to test

tarang-jain added 4 commits July 29, 2024 08:41

initial commit

8e8d3c1

merge 24.08

e937ebd

make build

0bbbf0d

Merge branch 'branch-24.08' of https://github.com/rapidsai/cuvs into …

02084e2

…diskann-wrapper

github-actions bot added cpp CMake labels Jul 29, 2024

Merge branch 'branch-24.10' into diskann-wrapper

64f1d60

tarang-jain added feature request New feature or request non-breaking Introduces a non-breaking change labels Jul 29, 2024

tarang-jain self-assigned this Jul 29, 2024

tarang-jain added 7 commits July 31, 2024 15:42

update wrapper

706f22e

Merge branch 'branch-24.08' of https://github.com/rapidsai/cuvs into …

3ea499b

…diskann-wrapper

diskann_memory working

e0aab8f

Merge branch 'branch-24.08' of https://github.com/rapidsai/cuvs into …

17c5510

…diskann-wrapper

Merge branch 'branch-24.10' of https://github.com/rapidsai/cuvs into …

f426df9

…diskann-wrapper

make compile

a7bdd33

Merge branch 'diskann-wrapper' of https://github.com/tarang-jain/cuvs …

d2442ca

…into diskann-wrapper

github-actions bot added the Python label Aug 3, 2024

tarang-jain and others added 12 commits August 5, 2024 11:10

rm num_threads_

dbc84cc

FEA Add cuvs-bench to dependencies and conda environments

7e37218

FIX add missing deps

b2aef6d

Merge branch 'fea-add-bench-deps' of https://github.com/dantegd/cuvs …

b9762d5

…into diskann-wrapper

FIX version and other improvements

bf75242

FEA Add cuvs_bench.run

a8bcdef

update patch;build command

3818da9

Merge branch 'cuvsbench-run' of https://github.com/dantegd/cuvs into …

cd8bfe5

…diskann-wrapper

FIX some cuvs_bench python build dependencies

ec6d70c

Merge branch 'cuvsbench-run' of https://github.com/dantegd/cuvs into …

c9f797a

…diskann-wrapper

FIX add missing algorithms.yaml

585ad53

Merge branch 'cuvsbench-run' of https://github.com/dantegd/cuvs into …

441ab2a

…diskann-wrapper

bkarsin reviewed Dec 3, 2024

View reviewed changes

tarang-jain and others added 2 commits December 3, 2024 20:22

rm cagra+diskann

f015264

Merge branch 'branch-24.12' into diskann-wrapper

6d6167d

Merge branch 'branch-24.12' into diskann-wrapper

9c2185b

tarang-jain mentioned this pull request Dec 5, 2024

[FEA] Serialize Vamana Dataset #502

Closed

tarang-jain added 2 commits December 10, 2024 12:08

Merge branch 'diskann-wrapper' of https://github.com/tarang-jain/cuvs …

2b758d3

…into diskann-wrapper

update copyright

b7ba35b

cjnolet approved these changes Dec 10, 2024

View reviewed changes

cjnolet changed the base branch from branch-24.12 to branch-25.02 December 10, 2024 20:47

tarang-jain and others added 2 commits December 11, 2024 10:14

Merge branch 'branch-25.02' into diskann-wrapper

63e02ff

Merge branch 'branch-25.02' into diskann-wrapper

48a6a9d

style

fd429d2

bkarsin approved these changes Dec 12, 2024

View reviewed changes

achirkin requested changes Dec 13, 2024

View reviewed changes

cpp/bench/ann/src/cuvs/cuvs_vamana_wrapper.h Show resolved Hide resolved

tarang-jain and others added 4 commits January 20, 2025 10:04

Merge branch 'branch-25.02' into diskann-wrapper

7acfe51

merge upstream

c7765d9

Merge branch 'diskann-wrapper' of https://github.com/tarang-jain/cuvs …

62b207f

…into diskann-wrapper

Merge branch 'branch-25.02' into diskann-wrapper

66001c8

tarang-jain commented Jan 25, 2025

View reviewed changes

tarang-jain and others added 2 commits January 25, 2025 14:50

Merge branch 'branch-25.02' into diskann-wrapper

0d63fb0

env update

b0fd532

achirkin requested changes Jan 27, 2025

View reviewed changes

tarang-jain added 3 commits January 27, 2025 06:37

kHostMmap

47b3be6

Merge branch 'diskann-wrapper' of https://github.com/tarang-jain/cuvs …

30892f9

…into diskann-wrapper

Merge branch 'branch-25.02' of https://github.com/rapidsai/cuvs into …

9743560

…diskann-wrapper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diskann Benchmarking Wrapper #260

Diskann Benchmarking Wrapper #260

tarang-jain commented Jul 29, 2024 •

edited

Loading

bkarsin Dec 3, 2024

tarang-jain Dec 4, 2024

bkarsin Dec 3, 2024

tarang-jain Dec 4, 2024

cjnolet commented Dec 5, 2024

tarang-jain commented Dec 5, 2024

cjnolet commented Dec 12, 2024

achirkin left a comment •

edited

Loading

tarang-jain commented Jan 25, 2025

tarang-jain Jan 25, 2025

achirkin Jan 27, 2025

achirkin left a comment

achirkin Jan 27, 2025

achirkin Jan 27, 2025

tarang-jain Jan 28, 2025

cjnolet Jan 31, 2025 •

edited

Loading

achirkin Jan 27, 2025

tarang-jain Jan 28, 2025

cjnolet Jan 28, 2025

cjnolet Jan 31, 2025

cjnolet commented Jan 30, 2025

Diskann Benchmarking Wrapper #260

Are you sure you want to change the base?

Diskann Benchmarking Wrapper #260

Conversation

tarang-jain commented Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet commented Dec 5, 2024

tarang-jain commented Dec 5, 2024

cjnolet commented Dec 12, 2024

achirkin left a comment • edited Loading

Choose a reason for hiding this comment

tarang-jain commented Jan 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achirkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet commented Jan 30, 2025

tarang-jain commented Jul 29, 2024 •

edited

Loading

achirkin left a comment •

edited

Loading

cjnolet Jan 31, 2025 •

edited

Loading